628 research outputs found
Active network management for electrical distribution systems: problem formulation, benchmark, and approximate solution
With the increasing share of renewable and distributed generation in
electrical distribution systems, Active Network Management (ANM) becomes a
valuable option for a distribution system operator to operate his system in a
secure and cost-effective way without relying solely on network reinforcement.
ANM strategies are short-term policies that control the power injected by
generators and/or taken off by loads in order to avoid congestion or voltage
issues. Advanced ANM strategies imply that the system operator has to solve
large-scale optimal sequential decision-making problems under uncertainty. For
example, decisions taken at a given moment constrain the future decisions that
can be taken and uncertainty must be explicitly accounted for because neither
demand nor generation can be accurately forecasted. We first formulate the ANM
problem, which in addition to be sequential and uncertain, has a nonlinear
nature stemming from the power flow equations and a discrete nature arising
from the activation of power modulation signals. This ANM problem is then cast
as a stochastic mixed-integer nonlinear program, as well as second-order cone
and linear counterparts, for which we provide quantitative results using state
of the art solvers and perform a sensitivity analysis over the size of the
system, the amount of available flexibility, and the number of scenarios
considered in the deterministic equivalent of the stochastic program. To foster
further research on this problem, we make available at
http://www.montefiore.ulg.ac.be/~anm/ three test beds based on distribution
networks of 5, 33, and 77 buses. These test beds contain a simulator of the
distribution system, with stochastic models for the generation and consumption
devices, and callbacks to implement and test various ANM strategies
An Optimal Control Formulation of Pulse-Based Control Using Koopman Operator
In many applications, and in systems/synthetic biology, in particular, it is
desirable to compute control policies that force the trajectory of a bistable
system from one equilibrium (the initial point) to another equilibrium (the
target point), or in other words to solve the switching problem. It was
recently shown that, for monotone bistable systems, this problem admits
easy-to-implement open-loop solutions in terms of temporal pulses (i.e., step
functions of fixed length and fixed magnitude). In this paper, we develop this
idea further and formulate a problem of convergence to an equilibrium from an
arbitrary initial point. We show that this problem can be solved using a static
optimization problem in the case of monotone systems. Changing the initial
point to an arbitrary state allows to build closed-loop, event-based or
open-loop policies for the switching/convergence problems. In our derivations
we exploit the Koopman operator, which offers a linear infinite-dimensional
representation of an autonomous nonlinear system. One of the main advantages of
using the Koopman operator is the powerful computational tools developed for
this framework. Besides the presence of numerical solutions, the
switching/convergence problem can also serve as a building block for solving
more complicated control problems and can potentially be applied to
non-monotone systems. We illustrate this argument on the problem of
synchronizing cardiac cells by defibrillation. Potentially, our approach can be
extended to problems with different parametrizations of control signals since
the only fundamental limitation is the finite time application of the control
signal.Comment: corrected typo
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies
Using deep neural nets as function approximator for reinforcement learning
tasks have recently been shown to be very powerful for solving problems
approaching real-world complexity. Using these results as a benchmark, we
discuss the role that the discount factor may play in the quality of the
learning process of a deep Q-network (DQN). When the discount factor
progressively increases up to its final value, we empirically show that it is
possible to significantly reduce the number of learning steps. When used in
conjunction with a varying learning rate, we empirically show that it
outperforms original DQN on several experiments. We relate this phenomenon with
the instabilities of neural networks when they are used in an approximate
Dynamic Programming setting. We also describe the possibility to fall within a
local optimum during the learning process, thus connecting our discussion with
the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho
Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes
We study the minmax optimization problem introduced in [22] for computing
policies for batch mode reinforcement learning in a deterministic setting.
First, we show that this problem is NP-hard. In the two-stage case, we provide
two relaxation schemes. The first relaxation scheme works by dropping some
constraints in order to obtain a problem that is solvable in polynomial time.
The second relaxation scheme, based on a Lagrangian relaxation where all
constraints are dualized, leads to a conic quadratic programming problem. We
also theoretically prove and empirically illustrate that both relaxation
schemes provide better results than those given in [22]
Benchmarking for Bayesian Reinforcement Learning
In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise
the collected rewards while interacting with their environment while using some
prior knowledge that is accessed beforehand. Many BRL algorithms have already
been proposed, but even though a few toy examples exist in the literature,
there are still no extensive or rigorous benchmarks to compare them. The paper
addresses this problem, and provides a new BRL comparison methodology along
with the corresponding open source library. In this methodology, a comparison
criterion that measures the performance of algorithms on large sets of Markov
Decision Processes (MDPs) drawn from some probability distributions is defined.
In order to enable the comparison of non-anytime algorithms, our methodology
also includes a detailed analysis of the computation time requirement of each
algorithm. Our library is released with all source code and documentation: it
includes three test problems, each of which has two different prior
distributions, and seven state-of-the-art RL algorithms. Finally, our library
is illustrated by comparing all the available algorithms and the results are
discussed.Comment: 37 page
On the Fairness of Centralised Decision-Making Strategies in multi-TSO Power Systems
In this paper, we consider an interconnected power system, where the different Transmission System Operators (TSOs) have agreed to transferring some of their competences to a Centralised Control Center (CCC). In such a context, a recurrent difficulty for the CCC is to define decision-making strategies which are fair enough to every TSO of the interconnected system. We address this multiobjective problem when the objective of every TSO can be represented by a real-valued function. We propose an algorithm to elect the solution that leads to the minimisation of the distance with the utopian minimum after having normalised the different objectives. We analyse the fairness of this solution in the sense of economics. We illustrate the approach with the IEEE 118 bus system partitioned in 3 areas having as local objective the minimisation of active power losses, the maximisation of reactive power reserves, or a combination of both criteria.multi-area power system, centralised control, multi-objective optimisation, fairness.
- …